/*******************************************************************************
Author: John Iselin
Date Updated: September 26, 2019

Goldman School of Public Policy 
Masters in Public Policy Program
Graduate Student Researcher for Hilary Hoynes

Modified from main do file from Hoynes, Schanzenbach and Almond for creating 
PSID data.

This cleaning process has the following steps: 
(1) Take the individual data and re-name the required variables
(2) Restrict variables, reshape, and remove immigrant sample
(3) Create appropriate Education and Year of Birth variables
(4) Merge the Public PSID Data with the Restricted Geographic data
(5) Merge the Combined Public - Restricted Data with FSP Rollout Data

Any questions can be directed to John Iselin at johniselin@berkeley.edu

*******************************************************************************/

* Set-Up
clear
set maxvar 6000
set linesize 200

********************************************************************************
***** (1) Rename variables from public individual-level dataset ****************
********************************************************************************

** Load the data
use ${data}psid_individual, clear 

** 	Variable 1: Sequence Number
** 				The sequence number conveys the relationship between the ind.
**				and the family unit during the time of the interview. 
** 	Note: 		In 1968 there was no sequence number, so the relation to HoH var
** 				is used instead. 

gen seqnum1968 = ER30003
gen seqnum1969 = ER30021
gen seqnum1970 = ER30044
gen seqnum1971 = ER30068
gen seqnum1972 = ER30092
gen seqnum1973 = ER30118
gen seqnum1974 = ER30139
gen seqnum1975 = ER30161
gen seqnum1976 = ER30189
gen seqnum1977 = ER30218
gen seqnum1978 = ER30247
gen seqnum1979 = ER30284
gen seqnum1980 = ER30314
gen seqnum1981 = ER30344
gen seqnum1982 = ER30374
gen seqnum1983 = ER30400
gen seqnum1984 = ER30430
gen seqnum1985 = ER30464
gen seqnum1986 = ER30499
gen seqnum1987 = ER30536
gen seqnum1988 = ER30571
gen seqnum1989 = ER30607
gen seqnum1990 = ER30643
gen seqnum1991 = ER30690
gen seqnum1992 = ER30734
gen seqnum1993 = ER30807
gen seqnum1994 = ER33102
gen seqnum1995 = ER33202
gen seqnum1996 = ER33302
gen seqnum1997 = ER33402
gen seqnum1999 = ER33502
gen seqnum2001 = ER33602
gen seqnum2003 = ER33702
gen seqnum2005 = ER33802
gen seqnum2007 = ER33902
gen seqnum2009 = ER34002
summ seqnum* 

** 	Variable 2: Sample Weights
** 				We use different weights for different years
* 	2009: 		COMBINED CORE-IMMIGRANT SAMPLE WEIGHT. 
* 	1968-1992: 	Individual Data Index 01 > SAMPLE WEIGHT 02 > Individual 03
* 				> Core
* 	1997-2009: 	Individual Data Index 01 > SAMPLE WEIGHT 02 > Individual 03
*				> Individual #1(1997+)
* 	1993-1996: 	Individual Data Index 01 > SAMPLE WEIGHT 02 > Individual 03
* 				> Core Longitudinal
* 				Only longitidinal core is available (we are dropping latino 
*					sample, no stfipsYOB

gen weight2009 = ER34045
gen weight2007 = ER33950
gen weight2005 = ER33848
gen weight2003 = ER33740
gen weight2001 = ER33637
gen weight1999 = ER33546
gen weight1997 = ER33430
gen weight1996 = ER33318
gen weight1995 = ER33275
gen weight1994 = ER33119 
gen weight1993 = ER30864
gen weight1992 = ER30803 
gen weight1991 = ER30730 
gen weight1990 = ER30686 
gen weight1989 = ER30641 
gen weight1988 = ER30605 
gen weight1987 = ER30569 
gen weight1986 = ER30534 
gen weight1985 = ER30497 
gen weight1984 = ER30462 
gen weight1983 = ER30428 
gen weight1982 = ER30398 
gen weight1981 = ER30372 
gen weight1980 = ER30342 
gen weight1979 = ER30312 
gen weight1978 = ER30282 
gen weight1977 = ER30245 
gen weight1976 = ER30216 
gen weight1975 = ER30187 
gen weight1974 = ER30159 
gen weight1973 = ER30137 
gen weight1972 = ER30116 
gen weight1971 = ER30090 
gen weight1970 = ER30066 
gen weight1969 = ER30042 
gen weight1968 = ER30019 

** 	Variable 3: Interview number
** 				The interview number is the family ID number. It is unique for 
**				each family unit and year. 

gen inum1968 = ER30001
gen inum1969 = ER30020
gen inum1970 = ER30043
gen inum1971 = ER30067
gen inum1972 = ER30091
gen inum1973 = ER30117
gen inum1974 = ER30138
gen inum1975 = ER30160
gen inum1976 = ER30188
gen inum1977 = ER30217
gen inum1978 = ER30246
gen inum1979 = ER30283
gen inum1980 = ER30313
gen inum1981 = ER30343
gen inum1982 = ER30373
gen inum1983 = ER30399
gen inum1984 = ER30429
gen inum1985 = ER30463
gen inum1986 = ER30498
gen inum1987 = ER30535
gen inum1988 = ER30570
gen inum1989 = ER30606
gen inum1990 = ER30642
gen inum1991 = ER30689
gen inum1992 = ER30733
gen inum1993 = ER30806
gen inum1994 = ER33101
gen inum1995 = ER33201
gen inum1996 = ER33301
gen inum1997 = ER33401
gen inum1999 = ER33501
gen inum2001 = ER33601
gen inum2003 = ER33701
gen inum2005 = ER33801
gen inum2007 = ER33901
gen inum2009 = ER34001

** Variable 4: 	Person Number (1968 Only)
** 				Used to keep track of how people get involved in the sample to 
** 				begin with.  

rename ER30002 person1968

** Variable 5: 	Relation to Household Head
** Note: 2009 Relationship to Head
* Note that these relationships are those to the 2007 Head for any individual 
* whose 2009 sequence number (ER34002) is greater than 50, that is, has moved 
* out of the FU. Thus, for example, if the 2007 Head is no longer present at the 
* time of the 2009 interview, his or her relationship to Head is coded 10 the  
* new 2009 Head also is coded 10. Therefore, to select current Heads, the user 
* must select those coded 10 in this variable whose sequence numbers (ER34002) 
* are coded 01.
*	 HH Adds wifes would not be seqnum=01 so to be safe use seqnum 1-20 **
* 	 HH adds that the only thing I am not sure I understand is that seqnum==0 if 
* 		in latino sample so not sure that my numbers will add up.


local n = 1
foreach y of numlist 1968(1)1997 1999(2)2009 {
	local headVars : word `n' of ///
		ER30003 ER30022 														///
		ER30045 ER30069 ER30093 ER30119 ER30140 ER30162 ER30190 ER30219 ER30248 ///
				ER30285 														///
		ER30315 ER30345 ER30375 ER30401 ER30431 ER30465 ER30500 ER30537 ER30572	///
				ER30608															/// 
		ER30644 ER30691 ER30735 ER30808 ER33103 ER33203 ER33303 ER33403 ER33503 ///
		ER33603 ER33703 ER33803 ER33903 ER34003 
	gen head`y' = .
	if `y'>=1983{
		replace head`y' = 1 if 		///
				`headVars' == 10 & 	///
				(seqnum`y'>=1 & seqnum`y'<=20)
		replace head`y' = 2 if 							///
				(`headVars' == 20 | `headVars' == 22) & ///
				(seqnum`y'>=1 & seqnum`y'<=20)
		}
	if `y'<1983{
		replace head`y' = 1 if `headVars' == 1 & (seqnum`y'>=1 & seqnum`y'<=20)
		replace head`y' = 2 if `headVars' == 2 & (seqnum`y'>=1 & seqnum`y'<=20)
		}

* 	HH 8-23-12 Check to see that there is at most one head, one spouse, any 
* 		number of non-head spouses 
*   NOTE the number of counthd and countsp includes nonhead nonspouse obs so N 
* 		is larger than N heads

	di "CHECK NUMBER OF HEADS PER HHOLD, SHOULD ALWAYS = ONE `y'"
	gen headdum=1 if head`y'==1
	bysort inum`y': egen counthd=count(headdum)
	tab counthd if inum`y'>0, m
	di "CHECK NUMBER OF SPOUSES PER HHOLD, AT MOST ONE `y'"
	gen spdum=1 if head`y'==2
	bysort inum`y': egen countsp=count(spdum)
	tab countsp if inum`y'>0, m
	drop counthd countsp headdum spdum 
	local n = `n' + 1
}

** 	Variable 6: Age at time of interview 
** 				For all years, used to capture observations not in 2009 

local n = 1 
foreach y of numlist 1968(1)1997 1999(2)2009 {
	local ageVars : word `n' of ///
		ER30004 ER30023 														///
		ER30046 ER30070 ER30094 ER30120 ER30141 ER30163 ER30191 ER30220 ER30249 ///
				ER30286															/// 
		ER30316 ER30346 ER30376 ER30402 ER30432 ER30466 ER30501 ER30538 ER30573 ///
				ER30609 														///
		ER30645 ER30692 ER30736 ER30809 ER33104 ER33204 ER33304 ER33404 ER33504 ///
		ER33604 ER33704 ER33804 ER33904 ER34004
	gen age`y' = `ageVars'
	gen ageCB`y' = . 
	replace ageCB`y' = 1 if age`y' == 1
	replace ageCB`y' = 2 if age`y' >= 2 & age`y' <= 125
	replace ageCB`y' = 3 if age`y' == 999
	replace ageCB`y' = 4 if age`y' == 0
	tab ageCB`y', m
	drop ageCB`y'
	summ age`y' if age`y' != 0 & age`y' != 999
	summ age`y' if weight`y' > 0
	local n = `n' + 1
}

** 	Variable 7: Year of Birth
** 	Variable 8: Month of Birth

local n = 1 
foreach y of numlist 2009(2)1999 1997(1)1983 {
**	Year 
	local yobVars : word `n' of	///
		ER34006 ER33906 ER33806 ER33706 ER33606									///
		ER33506 ER33406 ER33306 ER33206 ER33106 ER30811 ER30738 ER30694 ER30647	///
		ER30611 ER30575 ER30540 ER30503 ER30468 ER30434 ER30404 				///
		
	if `y'==2009 {
		gen yob = `yobVars'
	}
	
	replace yob = `yobVars' if yob==. | yob==0 | yob==9999 
	
** 	Month 
	local mobVars : word `n' of ///
		ER34005 ER33905 ER33805 ER33705 ER33605 								///
		ER33505 ER33405 ER33305 ER33205 ER33105 ER30810 ER30737 ER30693 ER30646 ///
		ER30610 ER30574 ER30539 ER30502 ER30467 ER30433 ER30403	
		
	if `y'==2009 {
		gen mob = `mobVars'
	}
	
	replace mob = `mobVars' if mob==. | mob==0 | mob==99 
	summ mob if weight2009> 0
	local n = `n' + 1
}

** 	Variable 9: 	Highest Grade Completed
**					Top-Coded at 17

local n = 1
foreach y of numlist 1968 1970(1)1997 1999(2)2009 {
	
	local eduVars : word `n' of ///
	ER30010 																	///
	ER30052 ER30076 ER30100 ER30126 ER30147 ER30169 ER30197 ER30226 ER30255 	///
			ER30296 															///
	ER30326 ER30356 ER30384 ER30413 ER30443 ER30478 ER30513 ER30549 ER30584 	///
			ER30620 															///
	ER30657 ER30703 ER30748 ER30820 ER33115 ER33215 ER33315 ER33415 ER33516 	///
	ER33616 ER33716 ER33817 ER33917 ER34020

	dis `eduVars'
	gen educ`y' = .
	replace educ`y' = `eduVars' if `eduVars' <= 17
	dis "test"
	summ educ`y' if educ`y' > 0
	gen eduCB`y' = .
	replace eduCB`y' = 1 if `eduVars' >= 1 & `eduVars' <= 16
	replace eduCB`y' = 2 if `eduVars' == 17 
	replace eduCB`y' = 3 if `eduVars' == 98 
	replace eduCB`y' = 4 if `eduVars' == 99 
	replace eduCB`y' = 5 if `eduVars' == 0 
	tab eduCB`y', m
	drop eduCB`y'
	summ educ`y' if weight`y' > 0
	local n = `n' + 1
}

** 	Variable 10: 	Sex
gen sex = ER32000
label values sex sexlbl
label define sexlbl 1 "Male", add
label define sexlbl 2 "Female", add
tab sex, m
replace sex = . if sex == 9


********************************************************************************
***** (2) Restrict variables, reshape, and remove immigrant sample *************
********************************************************************************

*** Keep select variables 
keep inum* weight* seqnum* head* age* yob mob educ* sex person1968  
summ

** Create ID variable via persomn1968 and inum1968
egen id = group(person1968 inum1968)

** Reshape file to person-year level 
reshape long inum weight seqnum head age educ, i(id) j(year)

sort id year 
by id: gen count_id = _n

gen pos_weight = (weight != 0)
label variable pos_weight "Tag for positive weight in given year"

gen count_id_weight = count_id if pos_weight == 1

by id: egen min_count_id_weight = min(count_id_weight)

gen first_weight_obs =	(min_count_id_weight == count_id_weight) & 	///
						(pos_weight == 1)
label variable first_weight_obs "First observation with positive weight"

drop min_count_id_weight 

tab year 
tab year if count_id == 1
tab year if first_weight_obs == 1
tab year [aw = weight]
tab year if count_id == 1 [aw = weight]
tab year if first_weight_obs == 1 [aw = weight]

** Limit Sample
** Drop those from the immigrant sample in 1997/1999 via 1968 interview number
** Notes: 
** 		-> 	1 - 2,930: Member of, or moved into, a family from the 1968 SRC 
** 			cross-section sample
**		-> 	3,001 - 3,511: Member of, or moved into, a family from the Immigrant 
**			sample added in 1997 and 1999. Values of 3001-3441 indicate families 
**			first interviewed in 1997; values of 3442-3511 indicate families not 
**			interviewed until 1999.
**		-> 	5,001 - 6,872: Member of, or moved into, a family from the 1968 
**			Census sample
**		-> 	7,001 - 9,308: Member of, or moved into, a family from the Latino 
**			sample added in 1990 and 1992. Values of 7001-9043 indicate families 
**			first interviewed in 1990; values of 9044-9308 indicate families not 
** 			interviewed until 1992. 

gen temp = inum if year == 1968
by id: egen inum1968 = max(temp)
drop temp 

drop if inum1968 >= 3001 & inum1968 <= 3511	
drop if inum1968 > 7000

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

********************************************************************************
***** (3) Create Education, Age, and Year of Birth Variables *******************
********************************************************************************

**** Education 

** 	Identify the observations with missing education 
tab educ, m  

** Identify where the missing values are
tab year educ if educ == 0 | educ == . , m

** For 1969, use 1968 education value 

gen educ_1968 = educ if year == 1968
by id: egen educ1968 = total(educ_1968)

replace educ = educ1968 if 		///
		year == 1969 

tab year educ if educ == 0 | educ == . , m		

** For other years, mode of years in sample

count if educ==.
egen educmode=mode(educ), by(id) maxmode
replace educ=educmode if educ==. & educmode~=.
count if educ==.
drop educmode educ1968 educ_1968

sort inum year  

tab year educ if educ == 0 | educ == . , m

* Replace head = 3 if not head or spouse
replace head = 3 if head == .

** Create mother + father tags
gen mother = (sex == 2 & head != 3)
gen father = (sex == 1 & head != 3)

** Create parental education var (mother, then father if no mother)
gen mother_educ_test = educ if mother == 1
gen father_educ_test = educ if father == 1

bysort inum year: egen mother_present = total(mother)
tab mother_present  /* check number of mothers per inum */
drop if mother_present > 1 /* Same sex couple */

bysort inum year: egen father_present = total(father)
tab father_present /* check number of fathers per inum */
drop if father_present > 1 /* Same sex couple */

bysort inum year: egen mother_education = total(mother_educ_test) 
bysort inum year: egen father_education = total(father_educ_test) 

gen parental_education = mother_education if mother_present == 1
replace parental_education = father_education if mother_present == 0 

sum parental_education


**** Age and Year of Birth 
sort id year

**Drop individuals where we never have age or YOB information
sum age yob

* Generate tag for each observation with no age (or yob) information 
gen missing_age = (age == 999) | (age == 0) 
gen missing_yob = (yob == 9999) | (yob == 0)

* Tag 
by id: egen min_missing_age = min(missing_age)
by id: egen min_missing_yob = min(missing_yob)

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

** SAMPLE RESTRICTION
drop if min_missing_age == 1 & min_missing_yob == 1

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

*drop if max_yob == 0 & max_age == 0
*drop max_* min_*

tab age, m
tab yob, m

** Fill in missing YOB information with other YOB values 
**		a. Other YOB values from same individual 
**		b. Age 
** Then re-calculate Age based on YOB

gen alt_yob = yob 

replace alt_yob = . if alt_yob == 0
replace alt_yob = . if alt_yob == 9999	

gen tag = (alt_yob == . )
gen tag2 = 1

by id: egen count_missingYOB = sum(tag)
by id: egen count_all = sum(tag2)

gen tag_noYOB = count_missingYOB == count_all

** The only individuals missing YOB information are missing YOB for ALL years
tab count_missingYOB
tab tag_noYOB


** And YOB Does not change across observations 
by id: egen max_yob_2 = max(alt_yob) 
by id: egen min_yob_2 = min(alt_yob)  

count if max_yob_2 == min_yob_2 
count if max_yob_2 != min_yob_2
tab max_yob_2 if max_yob_2 == min_yob_2 , m
tab max_yob_2 if max_yob_2 != min_yob_2 , m

drop max_yob_2 min_yob_2 tag tag2

** Use Age to fill in YOB when possible 
tab age if tag_noYOB == 1

gen alt_yob_from_age = year - age - 1 if 	///
	age != 999 & age != 0

* Tag if alt_yob_from_age is useful (missing yob) and usable (we have age data)
gen age_yob_tag = alt_yob == . & 		///
		age != 999 & 					///
		age != 0 

* Find the most common of new yob var 
bysort id: egen yob_mode = mode(alt_yob_from_age) ///
	if tag_noYOB == 1, maxmode

* Replace YOB if all observations are missing YOB
replace alt_yob	= yob_mode if 		///
	alt_yob == . & 					///
	tag_noYOB == 1

** Check for observations where: 
** (1) 	We predict they are born in or after 1968
** (2) 	Within the first 18 years of their life, if they were the oldest in our 
**		sample, they were either head or spouse. 

tab year tag_noYOB if head < 3 & alt_yob > 1968, m

gen drop_tag_1 = (head < 3 & alt_yob > 1968 & year <= 1986) | (alt_yob == .)

tab year drop_tag_1	

by id: egen drop_tag_2 = max(drop_tag_1)

tab year drop_tag_2 if year == 1968
tab alt_yob	, m

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]
 
** SAMPLE RESTRICTION
drop if drop_tag_2 == 1

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

tab alt_yob	, m
count

** Generate alternative age measure for use: 
gen alt_age = year - alt_yob 

* Examine Age Variable
replace alt_age = -1 if alt_age < 0 
tab alt_age , m

* Generate Age Buckets
gen alt_age_bucket = .
label define AGE_BUCKET -1 "Unborn", replace
replace alt_age_bucket = -1 if alt_age == -1

label define AGE_BUCKET 1 "Ages 0 to 2", add
replace alt_age_bucket = 1 if alt_age >= 0 & alt_age < 3

label define AGE_BUCKET 2 "Ages 3 to 5", add
replace alt_age_bucket = 2 if alt_age >= 3 & alt_age < 6

label define AGE_BUCKET 3 "Ages 6 to 11", add
replace alt_age_bucket = 3 if alt_age >= 6 & alt_age < 12

label define AGE_BUCKET 4 "Ages 12 to 15", add
replace alt_age_bucket = 4 if alt_age >= 12 & alt_age < 16

label define AGE_BUCKET 5 "Ages 16 to 18", add
replace alt_age_bucket = 5 if alt_age >= 16 & alt_age < 19

label define AGE_BUCKET 6 "19+", add
replace alt_age_bucket = 6 if alt_age >= 19 

label values alt_age_bucket AGE_BUCKET
 
tab alt_age_bucket, m
		
		
* Examine YOB Variable 
sum alt_yob if first_weight_obs == 1
tab alt_yob if first_weight_obs == 1


*** Find INUM for earliest Geocode match 
**	(1) Find INUM in YOB for matching (unless born in 1969)
gen inumYOB = .
replace inumYOB = inum if 			///
		alt_yob == year & 			///
		alt_yob != . & 				///
		alt_yob != 1969	
		
by id: egen assigned_yob_tag = total(inumYOB)
		
label variable inumYOB 				///
	"Interview number for family in first sample obs."
	
**	(2) Find INUM in first year in sample with positive weight (except 1969)

gen count_id_temp = count_id if weight > 0 & year != 1969
by id: egen min_count_id = min(count_id_temp) 

replace inumYOB = inum if 			///
		assigned_yob_tag == 0 & 	///
		min_count_id == count_id 

		
tab year if inumYOB != .
tab alt_yob if inumYOB != . 
tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

** Save the Individual file	
keep id year person1968 seqnum weight inum head age yob mob educ sex count_id 	///
		inum1968 alt_age alt_yob parental_education inumYOB alt_age_bucket		///
		first_weight_obs
compress

sort inum year
save ${data}psid_individual_clean, replace
clear

********************************************************************************
***** (4) Merge the Public PSID Data with the Restricted Geographic data *******
********************************************************************************

**			Note:
**			This section uses the confidential data. 
**			The file that generates the confidential data (create_sens.do) reads
**			in geocode file (simple fam x year file) and merges with various 
**			county files (FSP, CCDB, REIS). The output is senspsid.dta.

use ${data}psid_geo, clear

* Rename variables we need 
rename * , lower
rename year10 year
rename famid10 inum
rename state10 state
rename county10 county	


* keep the variables we want for now
keep inum year state county 

gen inumYOB = inum
gen stateYOB = state
gen countyYOB = count

compress
sort inum year
save ${data}sensTEMP, replace

clear

** Merge sensTEMPT with psid_individual_clean twice
use ${data}psid_individual_clean, replace


** Once on inum-year
merge m:1 inum year using ${data}sensTEMP, 		///
		gen(inum_merge) 						///
		keepusing(state county)

** Once on inumYOB-year 
merge m:1 inumYOB year using ${data}sensTEMP, 	///
		gen(inumYOB_merge)						///
		keepusing(stateYOB countyYOB)	

** Drop state and counties that were unmatched		
drop if inum_merge == 2 | inumYOB_merge == 2

drop inum_merge inumYOB_merge

** Assign individuals there 1968 state and county for 1969 
** If missing 1968, use 1970  

sort id year 

foreach var of varlist state county  {

	gen `var'_1968 = `var' if year == 1968 
	gen `var'_1970 = `var' if year == 1970
	by id: egen `var'1968 = max(`var'_1968)
	by id: egen `var'1970 = max(`var'_1970)

	replace `var' = `var'1968 if 	///
			`var' == . 	&			/// Missing state / county 
			year == 1969 & 			/// In 1969
			`var'1968 != . 			//	And not missing 1968 state /county
			
	replace `var' = `var'1970 if 	///
			`var' == . 	&			/// Missing state / county 
			year == 1969 & 			/// In 1969
			`var'1970 != . 			//	And not missing 1970 state /county
	
	drop `var'1968 `var'_1968 `var'1970 `var'_1970
} 
** Assign State and County of Birth to Rest of years

bysort id: egen temp1 = max(stateYOB)
bysort id: egen temp2 = max(countyYOB)

replace stateYOB = temp1 if stateYOB == .
replace countyYOB = temp2 if countyYOB == . 

drop temp1 temp2		
		
rename state stfips
rename county countyfips	
rename stateYOB stfipsYOB
rename countyYOB countyfipsYOB	
		
tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]
		
** SAMPLE RESTRICTION

** Keep if not missing calculated YOB variable 
drop if alt_yob == .

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

** Keep if not missing state and county geocode information (Curent and YOB)

keep if stfips != .
keep if countyfips != .
keep if stfipsYOB != .
keep if countyfipsYOB != .

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

** Keep if in age sample (Born between 1950 and 1980)

keep if alt_yob >= 1950
keep if alt_yob <= 1980 

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]
	
** Keep if born (alt_yob >= year) 

keep if alt_yob <= year

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]	
sum
des

save ${data}psid_individual_wsens, replace

********************************************************************************
***** (5) Merge the Combined Public - Restricted Data with FSP Rollout Data ****
********************************************************************************

** Prepare FSP data
use ${data}fsrollout, clear

gen stfipsYOB = stfips
gen countyfipsYOB = countyfips
gen statenameYOB = statename
gen countynameYOB = countyname
gen fs_monthYOB = fs_month
gen fs_yearYOB = fs_year

save ${data}fsrollout_YOB, replace

* Load Combined Public - Restricted Data

use ${data}psid_individual_wsens, clear

*** FIPS EDIT - FLORIDA 
*** Dade County renamed Miami Dade 
replace countyfipsYOB = 25 if countyfipsYOB == 86
replace countyfips = 25 if countyfips == 86

** Merge in FSP data on County and State of Birth

merge m:1 stfipsYOB countyfipsYOB using ${data}fsrollout_YOB, 	///
		gen(fsp_yob_merge)										/// 
		keepusing(statenameYOB countynameYOB 					///
		fs_monthYOB fs_yearYOB)

drop if fsp_yob_merge == 2		
		
tab stfips fsp_yob_merge, m	

** Merge in FSP data on Current Count and State 

merge m:1 stfips countyfips using ${data}fsrollout_YOB, 	///
		gen(fsp_merge)										/// 
		keepusing(statename countyname 						///
		fs_month fs_year)

drop if fsp_merge == 2		
		
tab stfips fsp_merge, m	
		
tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]
		
** SAMPLE RESTRICTION

** Drop if no current FSP data
keep if fsp_merge == 3
keep if fs_year != .

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

** Drop if no YOB FSP information 
keep if fsp_yob_merge == 3
keep if fs_yearYOB != . 

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

drop fsp_yob_merge fsp_merge 

** Drop if mob == 99 or mob == 0
keep if mob != 99
keep if mob != 0

tab year if first_weight_obs == 1
tab year if first_weight_obs == 1 [aw = weight]

save ${data}psid_individual_wsens_fsp, replace

clear


